Automated Contextual Information Retrieval Based on Multi-Tiered User Modeling and Dynamic Retrieval Strategy

ABSTRACT

Automated contextual information retrieval techniques are provided based on multi-tiered user modeling and a dynamic retrieval strategy. Content relevant to a current message is presented by initially obtaining a multi-tiered user model containing a multi-tiered representation of interactions of a first user with each contact, wherein the multi-tiered representation includes a plurality of topic models each corresponding to interactions between the first user and one contact. The topic models contain a set of topics, each containing topic keywords. Context information is extracted based on content of the current message, a sender and/or a recipient of the current message, and the multi-tiered user model. A retrieval strategy is determined based on the extracted context information. Contextual queries are generated to search the information repositories selected based on the determined retrieval strategy. Content relevant to the current message is presented based on search results from the selected information repositories.

FIELD OF THE INVENTION

The present invention relates generally to the electrical, electronicand computer arts, and, more particularly, to contextual informationretrieval (IR) for electronic mail (email) systems.

BACKGROUND OF THE INVENTION

In this age of “email overload,” knowledge workers must allocate asignificant amount of time required to manage their emails. While anumber of techniques have been proposed or suggested for reinventingemail systems, the most popular commercially-available products, such asLotus Notes and Microsoft Exchange, remain essentially unchanged,‘overloaded’ communication tools.

Knowledge workers often must synthesize and re-use information that waspreviously created. Knowledge workers typically must leave the contextof an email application in order to find the information necessary toreply to email messages. When the user leaves the current context togather information (for example, from an online resource or local harddrive) it can cause a delay of hours, or even days. This interruptionplaces the response at risk of falling through the cracks and itincreases the cognitive burden. The interruption also increases thefeelings of email overload on the user since this task is now extendingover time and the user must continue to keep track of the task.

A need therefore exists for improved email systems that automaticallygenerate relevant content in context without requiring users to leavethe email application.

SUMMARY OF THE INVENTION

Generally, methods and apparatus are provided for automated contextualinformation retrieval based on multi-tiered user modeling and a dynamicretrieval strategy. According to one aspect of the invention, contentrelevant to a current message is automatically presented to a first userby obtaining a multi-tiered user model containing a multi-tieredrepresentation of pair-wise interactions of a first user with each ofone or more contacts, wherein the multi-tiered representation includes aplurality of topic models, wherein each of the topic models correspondsto a pair-wise interaction between the first user and one of thecontacts, wherein each of the topic models contains a set of topics,wherein each topic contains a list of topic keywords; extracting contextinformation of a current message based on a content of the currentmessage, one or more of a sender and a recipient of the current message,and the multi-tiered user model; determining a retrieval strategy basedon the extracted context information of the current message; selectingfrom a set of information repositories for contextual informationretrieval based on the determined retrieval strategy, wherein the net ofinformation repositories comprises one or more of people directories, alocal memory, one or more online repositories, an email repository and acalendar entry repository; generating one or more contextual queries tosearch the selected information repositories; and presenting the contentrelevant to the current message from the selected informationrepositories based on the one or more contextual queries. The contactsmay comprise, for example, email contacts, calendar contacts, groups ofemail contacts and/or groups of calendar contacts. The current messagemay comprise, for example, an email message, a text message or atranscribed voice mail message.

The multi-tiered user model can be obtained by extracting andaggregating information from email and calendar content of the firstuser, and applying statistical techniques to create topic modelscorresponding to pair-wise interactions of the first user with each ofthe contacts of the first user.

The contextual queries to search the selected information repositoriescan be determined, for example, based on the content of the currentmessage, the topic keywords contained in a determined topic of thecurrent message, one or more of the sender and recipient of the currentmessage, and the determined retrieval strategy.

The topic of the current message can be determined, for example, byselecting one or more topic models from the multi-tiered user modelbased on one or more of a sender and recipient of the current message,and matching the content of the current message to the topics of theselected topic models to find the best topic.

The content relevant to the current message can be determined, forexample, by processing the search results from the selected informationrepositories based on the context of the current message and thedetermined retrieval strategy.

A more complete understanding of the present invention, as well asfurther features and advantages of the present invention, will beobtained by reference to the following detailed description anddrawings.

BRIEF DESCRIPTION OF THE DRAWINGS

FIGS. 1-3 illustrate an exemplary email exchange for an exemplary emailuser and associated presentation of tailored and relevant information inaccordance with the present invention;

FIG. 4 illustrates an exemplary email interface that incorporatesaspects of the present invention;

FIG. 5 is a block diagram of an exemplary automated contextualinformation retrieval email system incorporating features of the presentinvention;

FIG. 6 illustrates an exemplary user model incorporating features of thepresent invention;

FIG. 7 illustrates an exemplary model creation process for the creationof the user model in further detail;

FIG. 8 is a flow chart describing an exemplary implementation of acontextual information retrieval process that incorporates features ofthe present invention;

FIG. 9 is a sample table comprising an exemplary set of context factors;

FIG. 10 is a sample table comprising an exemplary set of retrievalfactors; and

FIG. 11 is a schematic block diagram of an automated contextualinformation retrieval (IR) system incorporating features of the presentinvention.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS

The present invention provides email methods and systems with automatedcontextual information retrieval. The disclosed automated contextualinformation retrieval email systems automatically generate content thatis relevant to a current message. According to one aspect of theinvention, the disclosed email systems employ contextual informationretrieval (IR) techniques informed by user models and email analytics.Generally, a disclosed user modeling component builds a model for eachuser based on the email and calendar content associated with that user.A disclosed contextual IR component uses the user model, and theinformation from the current email message, to determine what content isrelevant (e.g., documents from local memory or online resources, orinformation about the sender). The retrieved information is provided “atthe user's fingertips” within the email application so that the userdoesn't need to leave his or her current context to gather necessaryinformation.

As discussed hereinafter, the disclosed automated contextual informationretrieval email system obtains a comprehensive understanding of thecurrent context by augmenting the content of the current message with acumulative user model, dynamically built from the user's email andcalendar content. In addition, the disclosed user model includes arepresentation of the user's interactions with different persons/groupson different topics, which enables the context-sensitive informationneeds to be identified with a finer granularity, leading to higherrelevancy of the retrieved content. Further, the disclosed automatedcontextual information retrieval email system dynamically tailors theretrieval strategy (e.g., the sources to search, the types of documentsto retrieve, and the criteria for sorting multiple retrieval results),which improves the usefulness of the retrieved results given the limitedscreen real estate at the interface.

According to one aspect of the invention, the presented content varies,for example, based on the email recipient's relationship with the emailsender, the previous interaction between the sender and the recipient,and the topic of the current message.

For example, as discussed further below in conjunction with FIGS. 1-3,the amount of information presented in user profiles may vary based onthe relationship between the email user (email recipient) and the personidentified in the presented profile (such as an email sender). Forexample, for a well known sender, the presented information can belimited to basic information about a person such as name, email, andphone number. It serves the purpose of providing the user easy access tothe contact information of people whom the user interacts withfrequently. For an unknown sender, the presented information can includecontact information, photo, geography, and job description.

Exemplary Email Exchanges

FIGS. 1-3 illustrate an exemplary email exchange for an exemplary emailuser in accordance with the present invention. FIGS. 1-3 illustrate theautomatic presentation of relevant information in accordance withaspects of the present invention, and the tailoring of retrieved contentto different situations.

FIG. 1 illustrates an exemplary email exchange 100 for Alice according afirst scenario. As shown in FIG. 1, Alice receives an email 110 from anexemplary colleague Bob that works on a different team and in adifferent field. Alice doesn't know Bob and has had no previousinteraction with him. Bob asks about Alice's work on System T and wantsto know if it can help his own work. As discussed hereinafter, based onthe information extracted from the email 110 (e.g., sender, subject,body), collectively referred to as current context 120, and Alice's usermodel 540, as discussed further below in conjunction with FIG. 5, builtfrom her email and calendar content, the automated contextualinformation retrieval email system 500 infers that: 1) Alice is unlikelyto be familiar with Bob so she may need some background information onBob (e.g., job description, location of work); 2) Bob is likely to onlybe interested in high-level information about the topic without muchtechnical detail; 3) Alice is familiar with the topic and may needaccess to her local and online documents on this topic when generating aresponse to Bob. The automated contextual information retrieval emailsystem 500 then employs a contextual information retrieval process 800(FIG. 8) to automatically formulate queries based on the sender name andthe topic keywords of the current message to retrieve relevant content130 from multiple sources for presentation in a user interface 400, suchas the exemplary user interface 400 discussed further below inconjunction with FIG. 4. The retrieved information includes a detailedprofile of Bob, related files from Alice's hard drive (e.g.,presentations, screen shots), and links to related online documents andresources such as the demo videos, and wiki entries for System T.According to a further aspect of the invention, discussed further belowin conjunction with FIG. 4, the information is displayed within theemail application, so Alice doesn't have to leave her inbox.

FIG. 2 illustrates an exemplary email exchange 200 for Alice according asecond scenario. As shown in FIG. 2, Alice receives an email from Carol.Carol and Alice both work in the area of text analytics but on twodifferent teams. Alice has had infrequent interaction with Carol. Carolwonders if they can find synergy between their projects and collaborateon a prototype that showcases the technologies developed by each team.For this case, as discussed hereinafter, based on the current context220 and Alice's user model 540, the automated contextual informationretrieval email system 500 employs the contextual information retrievalprocess 800 (FIG. 8) to automatically retrieve: 1) a brief profiledescription (as a reminder) of Caro); 2) documents on Carol's work; and3) documents and links for Alice's work which could be shared withCarol, as well as related previous emails between Alice and Carol.Generally, the retrieved content 230 includes greater details indocuments (e.g., .doc, .pdf) as well as presentations and videos,relative to the retrieved content associated with FIG. 1.

FIG. 3 illustrates an exemplary email exchange 300 for Alice according athird scenario. As shown in FIG. 3, Alice receives an email from Dan,who works on the same team as Alice. The email is addressed to the wholeteam and contains an update of the text processing problem the team hasbeen working on lately. In this case, as discussed hereinafter, based onthe current context 320 and Alice's user model 540, the automatedcontextual information retrieval email system 500 determines that 1)Alice doesn't need photos and background information on the teammembers; 2) the information Alice may need is on a very specific topicand most likely contains technical details; and 3) recency ofinformation becomes more important because of frequency of interaction.Thus, the automated contextual information retrieval email system 500employs the contextual information retrieval process 800 (FIG. 8) toretrieve from Alice's user model topics specific to Alice's previousinteraction with the team, and uses these items to infer the topic ofthe current email. The keywords associated with the inferred topic arethen used to generate queries for retrieving information 330 comprisingrecently updated relevant documents from online resources (e.g., newentries/documents added to a team wiki) as well as Alice's hard drive,and recent related emails exchanged between some or all of the teammembers. Messages that belong to the same email thread can be groupedand presented together in the exemplary user interface 400. For Alice'sconvenience the automated contextual information retrieval email system500 also provides basic contact information of the team members (e.g., aphone number).

User Interface

FIG. 4 illustrates an exemplary email interface 400 that incorporatesaspects of the present invention. Generally, the interface component 400(FIG. 4) communicates with the contextual IR component 800 (FIG. 8) toprovide the context information extracted from the email application andsurface the retrieved information to the user. As shown in FIG. 4, whenthe user clicks on a particular email message from a list 440 ofmessages in his/her inbox 410 (or any user created folder) to read theselected message or generate a response to it, the automated contextualinformation retrieval email system 500 automatically conducts contextualIR based on the information extracted from the message and the usermodel.

The selected message 450 is displayed in a preview window 420 and theretrieved information about the selected message 450 is displayed in apanel 430. Entries in the panel 430 are organized into sections bytype/source. In the exemplary interface of FIG. 4, the section of“people” 460 includes the profiles of people related to the currentmessage. Documents in the “local documents” section 470 are retrievedfrom the user's hard drive, while documents in the “online documents”section 475 are retrieved from intranet applications and repositories(e.g., communities, wikis, blogs, file-sharing tools and paper/patentdatabases), for example, using APIs from an enterprise search system.See, for example, I. Ronen et al., “Social Networks and Discovery in theEnterprise (SaND),” SIGIR'09.

The “notes messages” section 480 includes the messages retrieved fromthe user's email database, for example, with subsections to distinguishbetween the messages from the same email thread as the current messageand the messages from other related threads. An attachment or linkindicator (such as a conventional paper clip icon) can be presented nextto a message when the content of this message contains one or more fileattachments or embedded web links.

Each entry displayed in the exemplary interface 400 can be hyper-linked.Thus, a single mouse click can open the corresponding profile, document,or message with its associated application (e.g., a view within thecurrent email application, or a browser page). The user can also clickon a zoom-in icon (such as a conventional magnifying glass icon) next toan entry, to display a slider 490 with key attributes for this entry(e.g., subject, abstract, attachments, and embedded links of an email).

System Architecture

FIG. 5 is a block diagram of an exemplary automated contextualinformation retrieval email system 500 incorporating features of thepresent invention. As shown in FIG. 5, the exemplary automatedcontextual information retrieval email system 500 comprises a usermodeling component 550, discussed further below in conjunction withFIGS. 6 and 7, that employs data structures and algorithms for computingand representing user information in a user model 540, such as how oftenthe user interacts with other individuals or groups through emails andmeetings and what topics they discuss.

As previously indicated, the contextual IR component 800, discussedfurther below in conjunction with FIG. 8, takes the generated user model540 and the current email message 510 as input and outputs informationrelevant to the current context using the interface 400. Depending onthe user information needs inferred from the user model 540 and thecurrent message 510, the relevant information can come from multiplesources including enterprise directories 560, for example, containinguser profiles, online resources 570, such as online communities, wikis,blogs, file-sharing tools (or information repositories), the user's harddrive 580 and the user's email and/or calendar databases 590.

User Modeling

In an exemplary embodiment, the user model 540 is created from theuser's email and calendar content, encoding information that can helpthe automated contextual information retrieval email system 500determine the dynamic, context-sensitive information needs of the user.

User Model Representation

FIG. 6 illustrates an exemplary user model 540 incorporating features ofthe present invention. As shown in FIG. 6, the exemplary user model 540encodes multiple tiers of information to represent the user'sinformation at different granularities. For example, basic informationis extracted from email and calendar messages, including textual contentsuch as subject and body, as well as metadata about the attached files,the embedded web links, and the persons as email senders/receivers andmeeting participants. Aggregate information is created by grouping basicinformation. Email and calendar messages are grouped into threads bysubject. As shown in FIG. 6, persons can be grouped based on theirassociations with email and calendar messages. Derived information, suchas interactions and affiliations, link each person or group that has hadinteraction with the user to the corresponding set of basic andaggregate information.

Based on the basic, aggregate, and derived information encoded in a usermodel 540, multiple topic models are created and stored in the usermodel 540 as well. Each topic model is created based on the aggregatecontent of the user's interaction within a specific interaction scope.An interaction scope can be an email thread with multiple messages, theinteraction with a single person/group, or the user's overallinteraction with other people as a whole. A topic model associated witha thread represents the topics discussed in this thread. A topic modelassociated with a person or group reflects the user's topics of interestspecific to this person or group. A general topic model derived from theaggregation of the user's interaction with all others represents theuser's overall areas of work. The use of multiple topic models enables auser's topics of interest to be represented at a finer granularity,which yields more accurate inference of the user's context-sensitiveinformation needs, thus resulting in higher relevancy of the retrievedcontent.

Each topic model contains a set of topics. In an exemplary embodiment,each topic is associated with two types of information: the probabilityof a word given this topic for all the words, and the probability ofthis topic given a message for all the messages in the associatedinteraction scope. The former probability provides a list ofrepresentative keywords that describe the topic, while the latterprovides a list of messages that are strongly associated with the topic.Topics are derived from content based on statistical language models(see next section for more details).

FIG. 6 also illustrates the information encoded in a user model 540. Theuser is linked to all the persons she or he has had interaction withthrough emails and calendar, and the groups of persons derived from thelists of email recipients and meeting participants (“Has-Interaction”).Each person is linked to the group she or he is affiliated with(“Is-Affiliated”). There are also group co-member relations amongpersons in the same group (“Is-GroupCoMembers”). Each person or group islinked to the topic model associated with this person or group(“About-Topics”). Particularly, FIG. 6 shows three topic models specificto User Jie's interaction with Users Jennifer, Shimei, and Zhen,respectively, and atopic model specific to User Jie's interaction withthem as a group.

User Model Creation

FIG. 7 illustrates an exemplary model creation process 700 for thecreation of the user model 540 in further detail. To build a user model540, the model creation process 700 first processes the user's email andcalendar content 590 during step 710 to extract basic information duringstep 720. The model creation process 700 creates a basic index 730using, for example, Lucene (see, for example, http://lucene.apache.org),which may include email and calendar messages, file attachments, weblinks, and persons. Then, the model creation process 700 furtherprocesses basic information to compute aggregate information during step740, and derived information during step 750, and derives topics duringstep 760, to create a model index 770 that contains threads, groups,interactions, affiliations, and topic models. In addition, the modelcreation process 700 computes a set of statistics such as each term'sfrequency of occurrences in all the messages, and gathers theorganizational and social relationship (for example, using an enterprisesearch system such as described in I. Ronen et al, “Social Networks andDiscovery in the Enterprise (SaND),” SIGIR'09) between the user and eachperson in the basic index. Such information can be stored in the modelindex as well. Both indices are updated periodically to incorporateinformation from new messages.

Creating Multiple Topic Models

As previously indicate], a user model 540 contains multiple topic modelseach associated with a specific interaction scope. For each interactionscope, the system first creates a text document for each message withinthis scope by concatenating the subject and body of the message.“Non-content” information (e.g., signature, disclaimer, and textformatting markup) can be removed using heuristic rules. Then, a topicmodel is created by deriving topics from the collection of these textdocuments, for example, using a Latent Dirichlet Allocation (LDA) method(see. D. Blei et al., “Latent Dirichlet Allocation.” J. of MachineLearning Research. 3:993-1022 (2003)). More specifically, throughstatistical inference, LDA estimates two topic-related probabilitydistributions: Φ(w, t): topic-specific word distribution which describesthe probability of a word w given a topic t, and Θ(d, t):document-specific topic distribution which describes the probability ofa topic t given a document d.

The exemplary embodiment employs the LDA implementation in MALLET (see,A. McCallum, “MALLET: A Machine Learning for Language Toolkit,”http://mallet.cs.umass.edu, 2002) to estimate the main parameters ofLDA, i.e. Φ(w, t) and Θ(d, t). Other LDA parameters can be determinedempirically (e.g., the number of topics in each topic model isproportional to the number of messages included in the associatedinteraction scope). To ensure the quality of the derived topics, aperson/group-specific topic model is created only when there are atleast 10 messages, and a thread-specific topic model requires at least 5messages. Topic models can be periodically updated to incorporate newmessages overtime.

Contextual Information Retrieval

FIG. 8 is a flow chart describing an exemplary implementation of acontextual information retrieval process 800 that incorporates featuresof the present invention. As shown in FIG. 8, the exemplary contextualinformation retrieval process 800 initially receives informationextracted from the current email message 810, such as sender, subject,and body. During step 820, as discussed further below in the sectionentitled “Selecting Relevant Topic Models,” the process 800 selects oneor more topic models to best infer the topic of the current messageduring step 830. Thereafter, during step 840, as discussed further belowin conjunction with FIG. 9, the process 800 computes the values of a setof context factors 900, such as whether the current message is threadedthe frequency and recency of the user's interaction with the sender, theuser's organizational/social relationship with the sendec and the user'sdegree of familiarity with the topic of the current message.

Based on the context factors computed during step 840, the process 800next determines its retrieval strategy during step 850 in the form of aset of retrieval factors 1000, discussed further below in conjunctionwith FIG. 10. Guided by the retrieval factors 1000, the contextualinformation retrieval process 800 then formulates queries during step860 to search one or more sources, processes the results returned fromthem during step 870, and creates a presentation of relevant informationto display to the user during step 880.

As shown in FIG. 8, the contextual information retrieval process 800employs the user model 540, as well as the enterprise directories 560,online resources 570, user's hard drive 580 and the user's email and/orcalendar databases 590, as discussed above in conjunction with FIG. 5.The instances 890 shown in FIG. 8 are discussed below in the sectionentitled “Determining Retrieval Strategy.”

Selecting Relevant Topic Models

As indicated above, during step 820, the contextual informationretrieval process 800 selects one or more topic models. With multipletopic models representing the user's granular topics of interest withindifferent interaction scopes, the problem of which topic model(s) to usefor inferring the topic of the current message arises. One solution isto calculate the degree of match between the interaction scope of eachtopic model and that of the current message, and select the model(s)whose matching score(s) is/are above a threshold.

For a thread-specific interaction scope, the degree of match between theassociated topic model and the message is 1 if the model represents thesame thread that this message belongs to and 0 otherwise. For aperson-specific interaction scope, if the message is an incoming email,its degree of match against the topic model associated with the senderis 1. If the message is an outgoing email, its degree of match againstthe topic model associated with any of the direct (i.e., not copied on)recipients is 1. All other person-specific topic models receive amatching score of 0. For a group-specific interaction scope, the degreeof match between the associated topic model and the message is computedbased on the degree of overlap o among members of the model's group g′and those of the message's group g (which includes all the peopleassociated with the message, i.e. sender, direct and indirectrecipien(s), as well as the average normalized co-membership strength sbetween members of these two groups (Formulas 1-3):

$\begin{matrix}{{{match}\left( {g,g^{\prime}} \right)} = {{0.5 \times {o\left( {g,g^{\prime}} \right)}} + {0.5 \times {s\left( {g,g^{\prime}} \right)}}}} & (1) \\{{o\left( {g,g^{\prime}} \right)} = \frac{\# {common\_ members}\left( {g,g^{\prime}} \right)}{\left( {{g} + {g^{\prime}}} \right)/2}} & (2) \\{{s\left( {g,g^{\prime}} \right)} = \frac{\sum\limits_{m \in g}{\sum\limits_{m^{\prime} \in g^{\prime}}\frac{\# {common\_ groups}\left( {m,m^{\prime}} \right)}{\max_{m^{''} \in g^{\prime}}\left( {\# {common\_ groups}\left( {m,m^{''}} \right)} \right.}}}{{g} \times {g^{\prime}}}} & (3)\end{matrix}$

where # denotes “the number of |·| and ildenotes the size of the group.

The topic models whose matching scores are above a threshold (forexample, 0.6) are considered relevant topic models. If none of the abovetopic models have a score greater than the threshold, the general topicmodel (derived from all of the user's email and calendar content) isselected and returned.

Inferring the Topic of the Current Message

As indicated above, during step 830, the contextual informationretrieval process 800 infers the main topic discussed in the currentemail message by identifying the best topic that can explain the currentmessage from the selected relevant topic models. For each relevant topicmodel, the system computes the topic distribution of the current emailmessage based on the probability distributions encoded in the model.See, for example, D. Blei et al., “Latent Dirichlet Allocation,” J. ofMachine Learning Research, 3:993-1022 (2003) or A. McCallum. “MALLET: AMachine Learning for Language Toolkit,” http://mallet.cs.umass.edu(2002). The topic with the highest probability is considered the besttopic within this topic model. Then, the contextual informationretrieval process 800 chooses the topic with the highest overallprobability among the best topics from all the relevant topic models asthe inferred topic of the current message.

The inferred topic t has two types of information associated with it: alist of keywords, and a set of messages. The keywords are the top-rankedwords that have the highest topic specific word probabilities in Φ(w,t). They are used in determining the query terms for retrieval. Themessages are the top-ranked documents that have the highest documentspecific topic probabilities in Θ(d. t). They are useful for filteringthe retrieved messages during result processing.

Determining Retrieval Strategy

As indicated above, during step 850, the contextual informationretrieval process 800 determines a retrieval strategy. The goal ofcontextual information retrieval is to provide relevant information tosatisfy the user's context-sensitive information needs. Because theuser's information needs vary depending on the message and context, thesystem dynamically determines its retrieval strategy for selectingsources to search, formulating queries, and processing search results inorder to optimize the relevance and usefulness of the information itprovides to the user within the limited space of the interface.

To help the contextual information retrieval process 800 infer theuser's context-sensitive information needs, a set of context factors 900are defined that represent different aspects of the current context.FIG. 9 is a sample table comprising an exemplary set of context factors900. Each context factor has a normalized value between 0 and 1 so thatthe values of different context factors have the same scale. For eachcontext factor identified in field 910, the context factor table 900identifies the corresponding function for computing the context factorin field 920. FIG. 10 is a sample table comprising an exemplary set ofretrieval factors 1000. A retrieval strategy is represented with a setof retrieval factors that imply different user information needs. Theprocess of determining a retrieval strategy is thus the process ofmapping from the values of context factors to those of retrievalfactors. For each retrieval factor identified in field 1010, theretrieval factor table 1000 identifies the corresponding value set forthe retrieval factor in field 1020.

The exemplary embodiment employs an instance-based mapping algorithm (S.Pan, “A Multi-Layer Conversation Management Approach for InformationSeeking Applications,” ICSLP'04) due to its advantages in flexibilityand extensibility over the rule-based algorithm with hard-coded rules.An instance consists of two parts: situation and decision. Using avector representation, where each dimension of the vector corresponds toa particular context factor, a situation represents a particular valuecombination of context factors. A decision uses a similar vectorrepresentation to encode a particular value combination of retrievalfactors. The instance-based algorithm matches the situations derivedfrom the current context against each situation s′ of all exampleinstances, and returns the decision associated with the instance thathas the best matched situation i.e., with the smallest distance dcalculated using Formula 4):

$\begin{matrix}{{d\left( {s,s^{\prime}} \right)} = {\sum\limits_{c}{{{v\left( {c,s} \right)} - {v\left( {c,s^{\prime}} \right)}}}}} & (4)\end{matrix}$

where c denotes a context factor 900 listed in FIG. 9, v(c, s) and v(c,s′) denote the values of the context factor c in the situations s and s′respectively, and |·| denotes the absolute value. A summation-basedmetric can be selected over the standard cosine similarity metric due toits simplicity, interpretability and low computational cost.

The example instances may be determined empirically based onobservations, or automatically learned using machine learning techniquesbased on explicit or implicit relevance feedback provided by the userthrough interactions with the system.

Formulating Queries

As indicated above, during step 860, the contextual informationretrieval process 800 formulates queries. The queries used forretrieving people's profile information (from an enterprise employeedirectory 560) are created based on the name and email address of eachperson associated with the current message as a sender or directrecipient. The queries used for document retrieval (from the user'semail database 590, hard drive 580, and online resources 570) areautomatically generated based on the prominent words contained in thecurrent message as well as the keywords associated with the inferredtopic of the current message. Specifically, to determine query terms,the exemplary contextual information retrieval process 800 firstextracts all the words from the subject and body of the current message(optionally excluding stopwords and “non-content” information such assignature, disclaimer, and text formatting markup) to create a contextvector of terms. The weight of each term is its frequency of occurrencesin the content of the current message multiplied by its inverse documentfrequency (the number of existing messages from the user model thatcontain the term).

Next, the exemplary contextual information retrieval process 800 obtainsthe list of the 20 top-ranked keywords associated with the inferredtopic of the current message. Because the weights of these keywordsgenerated by the topic inference algorithm may not be compatible withthe term weights in the context vector, they may not be directlycombined. One approach to combine two term weights is for the contextualinformation retrieval process 800 to increase the weight of a contextvector term by a fixed percentage if it occurs in this topic keywordlist to boost terms that are representative of the inferred topic. Anempirically determined value of 50% can be used.

Finally, the exemplary contextual information retrieval process 800selects the 5 top-ranked context vector terms as the primary queryterms. The 5 top-ranked topic keywords that are not stopwords, personnames, or primary query terms are added as expansion query terms.

For selected sources that support other search parameters in addition toquery terms, the exemplary contextual information retrieval process 800generates the values for these parameters based on the retrievalstrategy determined For example, if the value of the “docauthor”retrieval factor indicates that the system should restrict the retrievedonline documents to be those authored by the user/sender, theuser/sender's identity is included in the query for searching onlineresources that support filtering results by person. If a source allowsthe query to specify how the retrieval results should be sorted (e.g.,by date or by relevance), the exemplary contextual information retrievalprocess 800 includes the specification based on the value of the“sort_by” retrieval factor.

Processing Search Results

As indicated above, during step 870, the contextual informationretrieval process 800 processes the search results. The top-rankedsearch results returned from the selected sources (e.g., based onsource-specific cutoff thresholds) are further processed prior to beingdisplayed in the interface 400. The means by which the results areprocessed depends on the values of the relevant retrieval factors,including “profile_type”, “doc_type”, and “msg_filter”.

To process a result returned from a directory of people's profiles, theexemplary contextual information retrieval process 800 checks the valueof “profile_type” and extracts the required information, as discussedabove in conjunction with FIGS. 1-3. A “min” profile requires basicinformation about a person such as name, email, and phone number. Itserves the purpose of providing the user easy access to the contactinformation of people whom the user interacts with frequently. A “short”profile requires a person's photo in addition to the standard contactinformation, which can help refresh the user's memory about someone s/hehas interacted with infrequently. A “long” profile requires the mostinformation about a person, including contact information, photo,geography, and job description. It is the most comprehensive type ofprofile but also requires the most space real estate at the interface.Therefore, it is only necessary for people with whom the user has littleprevious interaction.

For each document in the search results returned from the user's harddrive 580 or online resources 570, the exemplary contextual informationretrieval process 800 determines its type and discards the document ifthe type is not included in the value of the “doc_type” retrievalfactor. Currently, two types of documents are distinguished: “basic” fordocuments targeted at a general audience, and “detailed” for documentsgeared towards an audience with relevant background. The type of adocument can be determined with a simple algorithm that classifies adocument based on the extension of its name and its source. For example,a file with a “.ppt”, “.avi”, or “.wmv” name extension or a starting webpage of an online community/wiki is considered “basic”, while a filewith a “.doc”, “ps”, etc. name extension or a web page from an onlinepaper/patent database is considered “detailed”. If the need for moregranular document types and a more sophisticated classificationalgorithm arises in applications, the system can easily be extended toincorporate them.

For messages retrieved from the user's email database 590, the exemplarycontextual information retrieval process 800 filters them based on thevalue of the “msg_filter” retrieval factor. If the value is “thread”,the system discards the messages that do not belong to the same threadas the current email message. If the value is “person” or “group”, thesystem discards the messages that are not associated with the person orgroup of the current message. To further improve the relevance of themessages to be presented to the user, the exemplary contextualinformation retrieval process 800 also discards the messages that do nothave a strong association with the inferred topic of the currentmessage.

Finally, the exemplary contextual information retrieval process 800performs duplicate removal, and removes redundancy from threadedmessages by discarding any message whose content is already fullycontained in the later messages of the same thread.

The disclosed automated contextual information retrieval email system500 automatically provides relevant content at the fingertips of anemail user based on the current email message the user is working withand a user model created from his/her email and calendar content.According to one aspect of the invention, a multi-tiered user modelencodes basic, aggregate, and derived information, as well as topicmodels for the user's interaction with others at the level of persons,groups, and email threads. Such a fine-grained user model helps thesystem identify more accurately the topic of the current message and theuser's information needs.

According to another aspect of the invention, the disclosed automatedcontextual information retrieval email system 500 employs a dynamicretrieval strategy that uses context information (e.g., how often theuser interacts with the sender and the user's familiarity with the topicof the current message) to determine which sources to search, how toformulate queries, and how to process retrieval results to optimize thevalue of the information displayed within the limited space of theinterface.

Exemplary System and Article of Manufacture Details

As will be appreciated by one skilled in the art, aspects of the presentinvention may be embodied as a system, method or computer programproduct. Accordingly, aspects of the present invention may take the formof an entirely hardware embodiment, an entirely software embodiment(including firmware, resident software, micro-code, etc.) or anembodiment combining software and hardware aspects that may allgenerally be referred to herein as a “circuit,” “module” or “system.”Furthermore, aspects of the present invention may take the form of acomputer program product embodied in one or more computer readablemedium(s) having computer readable program code embodied thereon.

As noted, aspects of the present invention may take the form of acomputer program product embodied in one or more computer readablemedium(s) having computer readable program code embodied thereon. Anycombination of one or more computer readable medium(s) may be utilized.The computer readable medium may be a computer readable signal medium ora computer readable storage medium. A computer readable storage mediummay be, for example, but not limited to, an electronic, magnetic,optical, electromagnetic, infrared, or semiconductor system, apparatus,or device, or any suitable combination of the foregoing. Media block isa non-limiting example. More specific examples (a non-exhaustive list)of the computer readable storage medium would include the following: anelectrical connection having one or more wires, a portable computerdiskette, a hard disk, a random access memory (RAM), a read-only memory(ROM), an erasable programmable read-only memory (EPROM or Flashmemory), an optical fiber, a portable compact disc read-only memory(CD-ROM), an optical storage device, a magnetic storage device, or anysuitable combination of the foregoing. In the context of this document,a computer readable storage medium may be any tangible medium that cancontain, or store a program for use by or in connection with aninstruction execution system, apparatus, or device.

FIG. 11 is a schematic block diagram of an automated contextualinformation retrieval (IR) system 1100 incorporating features of thepresent invention. Generally, as discussed further below, the contextualIR system 1100 employs a user model and a current email message andgenerates information relevant to the current context for efficientpresentation to the user.

One or more embodiments of the invention, or elements thereof, can beimplemented in the form of an apparatus including a memory and at leastone processor that is coupled to the memory and operative to performexemplary method steps. One or more embodiments can make use of softwarerunning on a general purpose computer or workstation. FIG. 11 depicts acomputer system 1100 that may be useful in implementing one or moreaspects and/or elements of the present invention. With reference to FIG.11, such an implementation might employ, for example, a processor 1102,a memory 1104, and an input/output interface formed, for example, by adisplay 1106 and a keyboard 1108. The term “processor” as used herein isintended to include any processing device, such as, for example, onethat includes a CPU (central processing unit) and/or other forms ofprocessing circuitry. Further, the term “processor” may refer to morethan one individual processor. The term “memory” is intended to includememory associated with a processor or CPU, such as, for example, RAM(random access memory), ROM (read only memory), a fixed memory device(for example, hard drive), a removable memory device (for example,diskette), a flash memory and the like. In addition, the phrase“input/output interface” as used herein, is intended to include, forexample, one or more mechanisms for inputting data to the processingunit (for example, mouse), and one or more mechanisms for providingresults associated with the processing unit (for example, printer). Theprocessor 1102, memory 1104, and input/output interface such as display1106 and keyboard 1108 can be interconnected, for example, via bus 1110as part of a data processing unit 1112. Suitable interconnections, forexample via bus 1110, can also be provided to a network interface 1114,such as a network card, which can be provided to interface with acomputer network, and to a media interface 1116, such as a diskette orCD-ROM drive, which can be provided to interface with media 1118.Analog-to-digital conver(er(s) 1120 may be provided to receive analoginput, such as analog video feed, and to digitize same. Suchconverter(s) may be interconnected with system bus 1110.

Accordingly, computer software including instructions or code forperforming the methodologies of the invention, as described herein, maybe stored in one or more of the associated memory devices (for example,ROM, fixed or removable memory) and, when ready to be utilized, loadedin part or in whole (for example, into

RAM) and implemented by a CPU. Such software could include, but is notlimited to, firmware, resident software, microcode, and the like.

A data processing system suitable for storing and/or executing programcode will include at least one processor 1102 coupled directly orindirectly to memory elements 1104 through a system bus 1110. The memoryelements can include local memory employed during actual implementationof the program code, bulk storage, and cache memories which providetemporary storage of at least some program code in order to reduce thenumber of times code must be retrieved from bulk storage duringimplementation. Input/output or I/O devices (including but not limitedto keyboards 1108, displays 1106, pointing devices, and the like) can becoupled to the system either directly (such as via bus 1110) or throughintervening I/0 controllers (omitted for clarity).

Network adapters such as network interface 1114 may also be coupled tothe system to enable the data processing system to become coupled toother data processing systems or remote printers or storage devicesthrough intervening private or public networks. Modems, cable modem andEthernet cards are just a few of the currently available types ofnetwork adapters.

As used herein, including the claims, a “server” includes a physicaldata processing system running a server program. It will be understoodthat such a physical server may or may not include a display andkeyboard.

A computer readable signal medium may include a propagated data signalwith computer readable program code embodied therein, for example, inbaseband or as part of a carrier wave. Such a propagated signal may takeany of a variety of forms, including, but not limited to,electro-magnetic, optical, or any suitable combination thereof. Acomputer readable signal medium may be any computer readable medium thatis not a computer readable storage medium and that can communicate,propagate, or transport a program for use by or in connection with aninstruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmittedusing any appropriate medium, including but not limited to wireless,wireline, optical fiber cable, RF, etc., or any suitable combination ofthe foregoing.

Computer program code for carrying out operations for aspects of thepresent invention may be written in any combination of one or moreprogramming languages, including an object oriented programming languagesuch as Java, Smalltalk, C++ or the like and conventional proceduralprogramming languages, such as the “C” programming language or similarprogramming languages. The program code may execute entirely on theuser's computer, partly on the user's computer, as a stand-alonesoftware package, partly on the user's computer and partly on a remotecomputer or entirely on the remote computer or server. In the latterscenario, the remote computer may be connected to the user's computerthrough any type of network, including a local area network (LAN) or awide area network (WAN), or the connection may be made to an externalcomputer (for example, through the Internet using an Internet ServiceProvider).

Aspects of the present invention are described below with reference toflowchart illustrations and/or block diagrams of methods, apparatus(systems) and computer program products according to embodiments of theinvention. It will be understood that each block of the flowchartillustrations and/or block diagrams, and combinations of blocks in theflowchart illustrations and/or block diagrams, can be implemented bycomputer program instructions. These computer program instructions maybe provided to a processor of a general purpose computer, specialpurpose computer, or other programmable data processing apparatus toproduce a machine, such that the instructions, which execute via theprocessor of the computer or other programmable data processingapparatus, create means for implementing the functions/acts specified inthe flowchart and/or block diagram block or blocks.

These computer program instructions may also be stored in a computerreadable medium that can direct a computer, other programmable dataprocessing apparatus, or other devices to function in a particularmanner, such that the instructions stored in the computer readablemedium produce an article of manufacture including instructions whichimplement the function/act specified in the flowchart and/or blockdiagram block or blocks.

The computer program instructions may also be loaded onto a computer,other programmable data processing apparatus, or other devices to causea series of operational steps to be performed on the computer, otherprogrammable apparatus or other devices to produce a computerimplemented process such that the instructions which execute on thecomputer or other programmable apparatus provide processes forimplementing the functions/acts specified in the flowchart and/or blockdiagram block or blocks.

The flowchart and block diagrams in the FIGS. illustrate thearchitecture, functionality, and operation of possible implementationsof systems, methods and computer program products according to variousembodiments of the present invention. In this regard, each block in theflowchart or block diagrams may represent a module, segment, or portionof code, which comprises one or more executable instructions forimplementing the specified logical function(s). It should also be notedthat, in some alternative implementations, the functions noted in theblock may occur out of the order noted in the figures. For example, twoblocks shown in succession may, in fact, be executed substantiallyconcurrently, or the blocks may sometimes be executed in the reverseorder, depending upon the functionality involved. It will also be notedthat each block of the block diagrams and/or flowchart illustration, andcombinations of blocks in the block diagrams and/or flowchartillustration, can be implemented by special purpose hardware-basedsystems that perform the specified functions or acts, or combinations ofspecial purpose hardware and computer instructions.

Method steps described herein may be tied, for example, to a generalpurpose computer programmed to carry out such steps, or to hardware forcarrying out such steps, as described herein. Further, method stepsdescribed herein, including, for example, obtaining data streams andencoding the streams, may also be tied to physical sensors, such ascameras or microphones, from whence the data streams are obtained.

It should be noted that any of the methods described herein can includean additional step of providing a system comprising distinct softwaremodules embodied on a computer readable storage medium. The method stepscan then be carried out using the distinct software modules and/orsub-modules of the system, as described above, executing on one or morehardware processors. In some cases, specialized hardware may be employedto implement one or more of the functions described here. Further, acomputer program product can include a computer-readable storage mediumwith code adapted to be implemented to carry out one or more methodsteps described herein, including the provision of the system with thedistinct software modules.

In any case, it should be understood that the components illustratedherein may be implemented in various forms of hardware, software, orcombinations thereof; for example, application specific integratedcircuit(s) (ASICS), functional circuitry, one or more appropriatelyprogrammed general purpose digital computers with associated memory, andthe like. Given the teachings of the invention provided herein, one ofordinary skill in the related art will be able to contemplate otherimplementations of the components of the invention.

The terminology used herein is for the purpose of describing particularembodiments only and is not intended to be limiting of the invention. Asused herein, the singular forms “a”, “an” and “the” are intended toinclude the plural forms as well, unless the context clearly indicatesotherwise. It will be further understood that the terms “comprises”and/or “comprising,” when used in this specification, specify thepresence of stated features, integers, steps, operations, elements,and/or components, but do not preclude the presence or addition of oneor more other features, integers, steps, operations, elements,components, and/or groups thereof.

The corresponding structures, materials, acts, and equivalents of allmeans or step plus function elements in the claims below are intended toinclude any structure, material, or act for performing the function incombination with other claimed elements as specifically claimed. Thedescription of the present invention has been presented for purposes ofillustration and description, but is not intended to be exhaustive orlimited to the invention in the form disclosed. Many modifications andvariations will be apparent to those of ordinary skill in the artwithout departing from the scope and spirit of the invention. Theembodiment was chosen and described in order to best explain theprinciples of the invention and the practical application, and to enableothers of ordinary skill in the art to understand the invention forvarious embodiments with various modifications as are suited to theparticular use contemplated.

1. A method for automatically presenting content relevant to a currentmessage, said method comprising: obtaining a multi-tiered user modelcontaining a multi-tiered representation of pair-wise interactions of afirst user with each of one or more contacts, wherein the multi-tieredrepresentation includes a plurality of topic models, wherein each of thetopic models corresponds to a pair-wise interaction between the firstuser and one of the contacts, wherein each of the topic models containsa set of topics, wherein each topic contains a list of topic keywords;extracting context information of a current message based on a contentof the current message, one or more of a sender and a recipient of thecurrent message, and the multi-tiered user model; determining aretrieval strategy based on the extracted context information of thecurrent message; selecting from a set of information repositories forcontextual information retrieval based on the determined retrievalstrategy, wherein the set of information repositories comprises one ormore of people directories, a local memory, one or more onlinerepositories, an email repository and a calendar entry repository;generating one or more contextual queries to search the selectedinformation repositories; and presenting the content relevant to thecurrent message from the selected information repositories based on theone or more contextual queries.
 2. The method of claim 1, wherein thecontacts comprise one or more of email contacts, calendar contacts,groups of email contacts and groups of calendar contacts.
 3. The methodof claim 1, wherein the multi-tiered user model is obtained byextracting and aggregating information from email and calendar contentof the first user, and applying statistical techniques to create topicmodels corresponding to pair-wise interactions of the first user witheach of the contacts of said first user.
 4. The method of claim 1,wherein the current message comprises one or more of an email message, atext message and a transcribed voice mail message.
 5. The method ofclaim 1, wherein the contextual queries to search the selectedinformation repositories are determined based on the content of thecurrent message, the topic keywords contained in a determined topic ofthe current message, one or more of the sender and recipient of thecurrent message, and the determined retrieval strategy.
 6. The method ofclaim 1, further comprising the step of determining the topic of thecurrent message by selecting one or more topic models from themulti-tiered user model based on one or more of a sender and recipientof the current message, and matching the content of the current messageto the topics of the selected topic models to find the best topic. 7.The method of claim 1, wherein the content relevant to the currentmessage is determined by processing the search results from the selectedinformation repositories based on the context of the current message andthe determined retrieval strategy.
 8. A system for automaticallypresenting content relevant to a current message, said systemcomprising: a memory; and at least one processor, coupled to the memory,operative to: obtain a multi-tiered user model containing a multi-tieredrepresentation of pair-wise interactions of a first user with each ofone or more contacts, wherein the multi-tiered representation includes aplurality of topic models, wherein each of the topic models correspondsto a pair-wise interaction between the first user and one of thecontacts, wherein each of the topic models contains a set of topics,wherein each topic contains a list of topic keywords; extract contextinformation of a current message based on a content of the currentmessage, one or more of a sender and a recipient of the current message,and the multi-tiered user model; determine a retrieval strategy based onthe extracted context information of the current message; select from aset of information repositories for contextual information retrievalbased on the deteimined retrieval strategy, wherein the set ofinformation repositories comprises one or more of people directories, alocal memory, one or more online repositories, an email repository and acalendar entry repository; generate one or more contextual queries tosearch the selected information repositories; and present the contentrelevant to the current message from the selected informationrepositories based on the one or more contextual queries.
 9. The systemof claim 8, wherein the multi-tiered user model is obtained byextracting and aggregating information from email and calendar contentof the first user, and applying statistical techniques to create topicmodels corresponding to pair-wise interactions of the first user witheach of the contacts of said first user.
 10. The system of claim 8,wherein the current message comprises one or more of an email message, atext message and a transcribed voice mail message.
 11. The system ofclaim 8, wherein the contextual queries to search the selectedinformation repositories are determined based on the content of thecurrent message, the topic keywords contained in a determined topic ofthe current message, one or more of the sender and recipient of thecurrent message, and the determined retrieval strategy.
 12. The systemof claim 8, wherein said processor is further configured to determinethe topic of the current message by selecting one or more topic modelsfrom the multi-tiered user model based on one or more of a sender andrecipient of the current message, and matching the content of thecurrent message to the topics of the selected topic models to find thebest topic.
 13. The system of claim 8, wherein the content relevant tothe current message is determined by processing the search results fromthe selected information repositories based on the context of thecurrent message and the determined retrieval strategy.
 14. An article ofmanufacture for automatically presenting content relevant to a currentmessage, comprising a tangible machine readable recordable mediumcontaining one or more programs which when executed implement the stepsof: obtaining a multi-tiered user model containing a multi-tieredrepresentation of pair-wise interactions of a first user with each ofone or more contacts, wherein the multi-tiered representation includes aplurality of topic models, wherein each of the topic models correspondsto a pair-wise interaction between the first user and one of thecontacts, wherein each of the topic models contains a set of topics,wherein each topic contains a list of topic keywords; extracting contextinformation of a current message based on a content of the currentmessage, one or more of a sender and a recipient of the current message,and the multi-tiered user model; determining a retrieval strategy basedon the extracted context information of the current message; selectingfrom a set of information repositories for contextual informationretrieval based on the determined retrieval strategy, wherein the set ofinformation repositories comprises one or more of people directories, alocal memory, one or more online repositories, an email repository and acalendar entry repository; generating one or more contextual queries tosearch the selected information repositories; and presenting the contentrelevant to the current message from the selected informationrepositories based on the one or more contextual queries.
 15. Thearticle of manufacture of claim 14, wherein the contacts comprise one ormore of email contacts, calendar contacts, groups of email contacts andgroups of calendar contacts.
 16. The article of manufacture of claim 14,wherein the multi-tiered user model is obtained by extracting andaggregating information from email and calendar content of the firstuser, and applying statistical techniques to create topic modelscorresponding to pair-wise interactions of the first user with each ofthe contacts of said first user.
 17. The article of manufacture of claim14, wherein the current message comprises one or more of an emailmessage, a text message and a transcribed voice mail message.
 18. Thearticle of manufacture of claim 14, wherein the contextual queries tosearch the selected information repositories are determined based on thecontent of the current message, the topic keywords contained in adetermined topic of the current message, one or more of the sender andrecipient of the current message, and the determined retrieval strategy.19. The article of manufacture of claim 14, further comprising the stepof determining the topic of the current message by selecting one or moretopic models from the multi-tiered user model based on one or more of asender and recipient of the current message, and matching the content ofthe current message to the topics of the selected topic models to findthe best topic.
 20. The article of manufacture of claim 14, wherein thecontent relevant to the current message is determined by processing thesearch results from the selected information repositories based on thecontext of the current message and the determined retrieval strategy.